UNL CSCE 830 - Course Project


Team: A-Team


Members:

Vlad Chiriacescu
Sergio Rico
Ertong Zhang
Zhongyin Zhang


Meeting date: Saturday, 11.15 AM


Meeting 1:

1) All of us installed the blaster driver.

2) Ran a clock example too see the steps that need to be performed in Quartus and
to get used with the board output.

3) Established the bubble sort algorithm in assembly. Rough ideas about the number
and specific instructions that will form our ISA.

4) Until the next meeting we try to get through a VHDL tutorial and maybe a little bit
on an assembly language one.

Update 1:

1) We figured out how to install the RS232 driver on Windows 7.

Meeting 2:

1) We came up with 3 different versions of the assembler code for bubble sort.
We compared them and then we chose our ISA.

2) We simulated the execution of a few instructions from the single cycle example.

3) Until the next meeting we will try to implement the assembler needed.

4) Some of us will continue looking at the single cycle example to understand it better
and even come up with some new parts needed for some of our instructions.

Meeting 3:

1) We came up with an improved version of the single cycle processor.
2) We successfully ran the serial communication example.

Milestone 2:

1) Instruction set architecture

Decided ISA is composed of the following instruction subset of the MIPS architecture:

Add immediate (addi)
Load word (lw)
Set on less than (signed) (slt)
Branch on equal (beq)
Store word (sw)

We also created the instruction called "uart" for communication with the UART RS232 transceiver.

2) Test data

The data part of our MIF data file is:

[00..FF]: 00000000;

00 : 00000006;
01 : 55555555;
02 : AAAAAAAA;
03 : 00000010;
04 : 00000001;
05 : F0000000;
06 : 12300000;

This data has been used as input to the sorting algorithm ran on the processor.
Verification has been done with ModelSim and also directly on the board.

3) Top level design

Entity MIPS with components:

Component Ifetch
Component Idecode
Component control
Component execute
Component dmemory

4) Serial communication

In the last days, we managed to integrate the UART example with
our processor and displayed the result of sorting in the Putty terminal.

Milestone 3 - pipelining the processor

1) For this step we implemented latches between each two processor stages in order to
realise the pipeline.

2) We added NOPs to the bubble sort algorithm in order to avoid data and control hazards.

3) We modified our assembler program so that it can correctly transform the new
bubble sort code into the correct MIF file.

4) We successfully simulated the new project using ModelSim.

5) The tests with the board have passed successfully. Here are our latest results, as shown
in the Putty terminal:

AAAAAAAA
F0000000
00000001
00000010
12300000
55555555


Milestone 4 - Data forwarding

Before data is sent to the execution (EXE) stage, the ID stage asks the EXE, DM and WB stages
to send their current data. Then in the ID stage, a dependency check is made on the data received
from those other stages. The ID will change the data in the two registers A and B according to the
dependencies found and avoid many of the possible data hazards.

Load instruction issue


LW instruction does not get the required data until this data is moved to the Data Memory (DM)

part. With the data forwarding mechanism presented above, the best that can be achieved is to

forward the EXE output of one instruction to the EXE input of the next instruction in case of data

dependencies. But if we use the previous mechanism, we cannot solve the data dependency between

a load (LW) instruction and the next instruction since we need the LW output at the EXE input of next

instruction. This happens all in all situations except the case when the next instruction is a store instruction.


Our solution for this issue is simple and straight-forward: we introduce a bubble in the pipeline.


The detailed process is as follows:


1. We decode the instruction in IF stage, so we can know which instruction is LW (load) in IF stage.

2. If the instruction is a load instruction, we add a bubble, even though we do not know whether there is

a hazard between this load instruction and the next instruction.

3. Therefore the pipeline is stalled for one time. As we are going to send a bubble, we do not need to read instruction next time.


Milestone 5 - Cache


Parameter description





Block size
4*32B
Associativity
1
Number of blocks in cache
16
Total
2 KB


Time sequence

As we have to read the memory in case of cache miss, we use a two-tier time sequence.
We use the low frequency clock to control the pipeline and the high frequency one to
fasten the transmission of data from memory to cache. Thus, we make sure we get the
right data even if there is a conflict miss and the data is not in the cache.


Cache miss

If data is found in the cache then everything is ok and the fastness of direct mapped cache
is fully exploited (less time to search for data in cache as compared to higher associativity).
The problem arises when the processor tries to read data form cache but the data is not found.
In this case, we fetch the data. In order to do this, we take advantage that there are two edges in one clock cycle.
We use the falling edge to check if it needs to read from memory or write to memory. If so, we read from or write to memory.
In the rising edge, if we know from the falling edge part we should read or write, we will bring the data from memory to
cache (in case of a read) or send the data to memory (in case of a write). After completing this process, we set the dirty and valid bits.

Write method
The write method used is write back. Since the cache is large (2KB) compared to data required for the given problem,
there are not many write backs and this method clearly outperforms a write-through method.
Results

Our final results including:
–Total Branch Count
–Total Miss predicted Branches
–Instruction Memory Access Count
–Instruction Cache Misses
–Data Memory Access Count
–Data Cache Misses
–Data Cache Write-Backs

Results.png